Building a Diverse Document Leads Corpus Annotated with Semantic Relations

نویسندگان

Masatsugu Hangyo

Daisuke Kawahara

Sadao Kurohashi

چکیده

In these days, semantic analysis has been actively studied in natural language processing. For the study of semantic analysis, corpora with semantic annotations are essential. Although there are such corpora annotated on newspaper articles, there are various genres and styles, including linguistic expressions that are not found in newspaper articles. In this paper, we build a diverse document leads corpus annotated with semantic relations. To reduce the workload of annotators and annotate as many various documents as possible, we restrict the annotation target of each document to only the first three sentences. We have completed building a corpus of 1,000 documents and report the statistics of this corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Corpus of Temporal-Causal Structure

While recent corpus annotation efforts cover a wide variety of semantic structures, work on temporal and causal relations is still in its early stages. Annotation efforts have typically considered either temporal relations or causal relations, but not both, and no corpora currently exist that allow the relation between temporals and causals to be examined empirically. We have annotated a corpus...

متن کامل

Automatic Acquisition of the <i>Argument-Predicate</i> Relations from a Frame-Annotated Corpus

This paper presents an approach to automatic acquisition of the argumentpredicate relations from a semantically annotated corpus. We use SALSA, a German newspaper corpus manually annotated with role-semantic information based on frame semantics. Since the relatively small size of SALSA does not allow to estimate the semantic relatedness in the extracted argument-predicate pairs, we use a larger...

متن کامل

MEANTIME, the NewsReader Multilingual Event and Time Corpus

In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, ...

متن کامل

Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature

This paper describes the process of creating a corpus annotated for concepts and semantic relations in the scientific domain. A part of the ACL Anthology Corpus was selected for annotation, but the annotation process itself is not specific to the computational linguistics domain and could be applied to any scientific corpus. Concepts were identified and annotated fully automatically, based on a...

متن کامل

Towards a Balanced Named Entity Corpus for Dutch

This paper introduces a new named entity corpus for Dutch. State-of-the-art named entity recognition systems require a substantial annotated corpus to be trained on. Such corpora exist for English, but not for Dutch. The STEVIN-funded SoNaR project aims to produce a diverse 500-million-word reference corpus of written Dutch, with four semantic annotation layers: named entities, coreference rela...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Building a Diverse Document Leads Corpus Annotated with Semantic Relations

نویسندگان

چکیده

منابع مشابه

Building a Corpus of Temporal-Causal Structure

Automatic Acquisition of the <i>Argument-Predicate</i> Relations from a Frame-Annotated Corpus

MEANTIME, the NewsReader Multilingual Event and Time Corpus

Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature

Towards a Balanced Named Entity Corpus for Dutch

عنوان ژورنال:

اشتراک گذاری